Role of Secondary Motifs in Fast Folding Polymers: A Dynamical Variational 

Principle 
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A fascinating and open question challenging biochemistry, physics and even geometry is the presence 
of highly regular motifs such as a-helices in the folded state of biopolymers and proteins. Stimulating 
explanations ranging from chemical propensity to simple geometrical reasoning have been invoked 
to rationalize the existence of such secondary structures. We formulate a dynamical variational 
principle for selection in conformation space based on the requirement that the backbone of the 
native state of biologically viable polymers be rapidly accessible from the denatured state. The 
variational principle is shown to result in the emergence of helical order in compact structures. 



A fundamental problem in every day life is that of 
packing with examples ranging from fruits in a grocery, 
clothes and personal belongings in a suitcase, atoms and 
colloidal particles in crystals and glasses, and amino acids 
in the folded state of proteins. The simplest problem in 
packing consists of determining the spatial arrangement 
that accomodates the highest packing density of its con- 
stituent entities with the result being a crystalline struc- 
ture. Besides packing considerations, dynamical effects 
play a significant role when rapid packing/unpacking is 
entailed, as in the formation of amorphous glasses where 
crystallization is dynamically thwarted or in the more 
familiar suitcase problem. 

Fast packing has been recognized as a central issue 
for biopoWmers, such as proteins, since the early work of 
Levinthalld. Further, the native conformations display ex- 
tremely regular motifs, such as a-helices or /3-sheets. In 
this Letter we postulate a direct connection between the 
dynamics of rapid folding and the emergence of secondary 
motifs in the native state conformations. In fact, an in- 
tuitive approach to rapid and reproducible folding might 
be to create neat patterns of lower dimensional mani- 
folds than the physical space and bend and curl them 
into the final folded state. For proteins, secondary struc- 
tures such as a-helices and /3-sheets are indeed patterns 
in low dimensions. 

There are two key aspects distinguishing a protein 
from a generic heteropolymer: the specially selected se- 
quence of amino acids and the three-dimensional struc- 
ture that it folds reversibly into. For a given target 
native structure, the selection mechanism ip sequence 
space is the principle of minimal frustrations. The cho- 
sen sequences are such that their target .native states 
are reached through a funnel-like landscapeo which facil- 
itates the harmonious fitting together of pieces to form 
the whole. 

The three-dimensional structure impacts on the func- 
tionality of the protein and a fascinating issue is the eluci- 
dation of the selection mechanism in conformation space 
that picks out certain viable structures from the innu- 



merable ones with a given compactness. Earlier studies 
have shown that there is a direct link between viable na- 
tive conformations and high designabilityQ. RecentlyH, 
it was observed that the natural folds of proteins have 
a much larger density of nearby structures than generic 
(artificial) conformations of the same character and that 
the exceedingly large geometrical accessibility of natu- 
ral proteins may be related to the presence of secondary 
motifs. 

The realization that proteins have secondary structures 
arose with early crystallographic studies and the brilliant 
deduction of Pauling et alia of the ability of an a-helix 
of the correct pitch to accomodate hydrogen bonds, thus 
promoting its stability. Inspired by the findings of Paul- 
ing, helix-coil transition models have beea used to study 
the thermodynamics of helix formation!! The models 
encompass features that ensure the helical nature of the 
low-energy states by assuming first that that monomers 
can be in a helical state and by then introducing co- 
operative interactions that favor helical regions. It is 
interesting to note, however, that the number of hydro- 
gen bonds is nearly the same when a sequence is in an 
unfolded structure in the presence of a polar solvent or 
in its native state rich in secondary structure contentfl. 
It has also been suggested that the a-helix is an energet- 
ically favorable conformation for main-chairuatoms but 
the side-chain suffers from a loss of entropyifl. Nelson 
et al.iH have shown both numerically and experimentally 
that non-biological oligomers fold reversibly like proteins 
into a specific three-dimensional structure with high he- 
lical content driven only by solvophobic interactions. Re- 
cent studies have attempted to explain the emergence of 
secondary structure from geometrical principles rather 
than invoking detailed chemistry. Despite the concerted 
efforts of several groups, a simple general explanation ce.- 
mains elusive. In particular, ±he work of Yee et aZ.til, 
Hunt et aln, and Socci et al.t3 have shown that com- 
pactness alone can only account for a small secondary 
structure content. These facts are also corroborated by 
the recent study of the kinetics of homopolymer collapse, 
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where no evidence-was found for the formation of local 
regular structures^. 

We propose a selection mechanism in structure space 
in the form of a variational principle postulating that, 
among all possible native conformations, a protein back- 
bone will attain only those which are optimal under the 
action of evolutionary pressure favouring rapid folding. 
Our goal is to elucidate the role played by the bare na- 
tive backbone independent of the selection in sequence 
space and hence of the (imperfectly-known) inter-amino- 
acid potentials. We therefore choose to employ a Go-like 
modelli-3 with no other interaction that promotes or dis- 
favours secondary structures. The model is a sequence- 
independent limiting case of minimal frustrations which, 
for a given target native state conformation, favours the 
formation of native contacts - the energy of a sequence 
in a conformation is simply obtained as the negative of 
the number of contacts in common with the target con- 
formation. We will consider two non-consecutive amino 
acids to be in contact if their separation is below a cut- 
off r = 6.5 A (the results are qualitatively similar when 
slightly different values of r in the range 6 — 8 Aare 
chosen) . 

The energy of structure T in the Go model is given by 



H(T) 



*2> 



-•(HA^ro) 



(i) 



where the sum is taken over all pairs of amino acids, 
r is the target structure, Aij(T) is the contact map of 
structure T: 



Ay(r) = { 



1 Rij < r and \i — j\ > 2; 







otherwise, 



(2) 



where Rij is the distance of amino acids i and j. 

The polypeptide chain is mpdelled as a chain of beads 
subject to steric constraintsEjH. We adopted a dis- 
crete representation similar to the one of Covell and 
Jernigariij, in which each bead occupies a site of an FCC 
lattice with lattice spacing equal to 3.8 A. Such a rep- 
resentation is able to describe the backbone of natural 
proteins to better that 1 A rmsd per residue (equal to the 
best experimental resolution) and preserves typical tor- 
sional angles. All discretized structures were subject to 
a suitable constraint: any two non-consecutive residues 
cannot be closer than 4.65 A due to excluded volume ef- 
fects and the distance between consecutive residues can 
fluctuate between 2.6 A< d < 4.7 A. Such constraints 
were determined by an analysis of the coarse-grainings of 
several proteins of intermediate length (w 100 residues). 
In order to enforce a realistic global compactness for a 
backbone of length L, the number of contacts in all the 
target structures considered was choserilj to be around 
N = 1.9L while, locally, no residue was allowed to make 
contact with four or more consecutive residues. 

In order to assess the validity of the variational prin- 
ciple, it is necessary to evaluate the typical time, i(ro), 



taken to fold into a given target structure, To, followed 
by a selection of the structures T , that have the smallest 
folding times. To do this, an initial set of ten conforma- 
tions was generated by collapsing a loose chain starting 
from random initial conditions. In each case, we modified 
the random initial conformation by using Monte Carlo 
dynamics: we move up to 3 consecutive beads to unoc- 
cupied discrete positions that do not violate any of the 
physical constraints and accept the moves according to 
the standard Metropolis rule. The energy is given by eq. 
(pi), while the temperature for the MC dynamics was set 



iminary runs so that 
below which the se- 



to 0.35. This value was chosen in pn 
it was higher than the temperature! 
quence is trapped in metastable states but comparable to 
the folding transition temperature so that conformations 
with significant overlap with the native state are sampled 
in thermal equilibrium. 

For each structure, as a measure of the folding time 
we took the median over various attempts (typically 41) 
of the total number of Monte Carlo moves necessary to 
form a pre-assigned fraction of native contacts, typically 
66%, starting from a random conformation. Our results 
were unaltered on increasing this fraction to 75%; in- 
deed, this fraction could be progressively increased to- 
wards 100% with successive generations without increase 
in the computational cost since better and better folders 
are obtained. 

A new generation of ten structures is created by "hy- 
bridizing" pairs of structures of the previous genera- 
tion ensuring that structures with small folding times 
are hybridized more and more frequently as the num- 
ber of generations, g, increasesEZl. To do this, each 
of the two distinct parent structures to be hybridized, 
Ti and T2 are chosen with probability proportional to 
cxp[— (g — 1) * ft)/ 1000], where g is the index of the cur- 
rent generation (initially equal to 1), ft is the median 
folding time. Then, a hybrid map is created by taking 
the union of the two parent maps: 



A Umon = max 



(^(TO.AyCra)). (3) 



Because it is not guaranteed that A Umon corresponds to 
a three-dimensional structure obeying the same physical 
constraints as Ti and T2, the corresponding hybrid T is 
constructed by taking one of the two parent structures (or 
alternatively a random one) as the starting conformation 
and carrying out MC dynamics favouring the formation 
of each of the contacts in the union map (i.e. using eq. 
(1) with Ay(r ) substituted by A^ nion ). The dynamics 
is carried out starting from a temperature of 0.7 and then 
decreasing it gradually over a sufficiently long time (typ- 
ically thousands of MC steps) to achieve the maximum 
possible overlap with the union map, while simultane- 
ously maintaining the realistic compactness. The result- 
ing hybrid structure is typically midway between the two 
parent structures, in that it inherits native contacts from 
both of them. We adopted the following definition in 
order to obtain an objective and unbiased way to quan- 
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titatively estimate the presence of secondary content: a 
given residue, i was defined to belong to a secondary mo- 
tif if, for some j, one of these conditions held: 

a) = Ajj = Ai + ij + i = Aij+i 

= Ai + ij + 2 = Ai_ij = 1; 
&) Aj + ij_i = Aij — Aj-ij+i = Ajj+i 

= — ^i-l,j+2 = 1- 

The former [latter] identifies the presence of helices and 
parallel [anti-parallel] (3 sheets in natural proteins, which 
can be identified by the visual inspection of contact ma- 
trices and appears as thick bands parallel or orthogonal 
to the diagonal. 

The upper plot of Fig. 1 shows the decrease of the typical 
folding time over the generations for chains of length 25, 
while the middle panel shows the accompanying increase 
in the number of residues in secondary motifs (secondary 
content). The bottom panel shows a milder decrease of 
the contact order (i.e. a larger number of short-range 
contacts) as the generations evolved, in agreement with 
the experimental findings of Plaxco et al&3 

One of the optimal structures of length 25 is shown 
in Figure ||a. Due to the absence of any chirality bias 
in our structure space exploration, the helix docs not 
have a constant handedness. The signature of the sec- 
ondary motifs in the optimal structures is clearly visible 
in the contact maps of Figure ^, which are not sensitive 
to structure chirality. Strikingly, the variational principle 
selects conformations with significant secondary content 
as those facilitating the fastest folding. The correlation 
of the emergence of secondary structures with decrease of 
folding times is shown in the plot of Fig. |J. We verified 
that the hybridization procedure is not biased towards 
low contact order by iterating it for various generations 
and hybridizing the structures at random. Even after 
dozens of generations, the generated structures had sec- 
ondary contents of about 1/3-1/4 of the true extremal 
structures. 

The very high secondary content in optimal conforma- 
tions was found to be robust against changes in chain 
length or compactness of the target structure. On re- 
quiring that the structure be more compact, bundles of 
helices emerge [see Fig. ^|b] along with an increase in con- 
tact order, signalling the presence of some longer range 
contacts, which are necessitated in order to accomodate 
the shorter radius of gyration. It is noteworthy that our 
calculations lead predominantly to a-helices and not (3 
sheets, a fact accounted for by the demonstration that 
steric overlaps and the associated loss of entropy lead to 
the destabilization of helices in favor of sheetsB, the apt 
pearance of such sheets only in sufficiently long proteinsO 
and the much slower folding rate of /3-sheets compared 
to a-helicesEJ. It is remarkable that the same require- 
ment of rapid folding is sufficient to lead to a selection 
in both sequence and structure space underscoring the 
harmony in the evolutionary design of proteins. The re- 
sults and strategies presented here ought to be applicable 



in protein-engineering contexts, for example by ensuring 
optimal dynamical accessibility of the backbone of pro- 
teins. A systematic collection of the rapidly-accessible 
structures of various length should also lead to the cre- 
ation of unbiased libraries of protein folds. 
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FIG. 2. a) RASMOL plot of a structure with very low me- 
dian folding time and L = 25. b) Structure with very low me- 
dian folding time, L = 25 and higher compactness (all target 
conformations were constrained to have a radius of gyration 
smaller than 6.5 A). Optimal compact structures correspond 
to helices packed together, as observed in naturally occurring 
proteins. 
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(a) (b) 
FIG. 3. The panel on the left [right] shows the contact 
map of a structure with a very low [average] median folding 
time. The signature of helices in map (a) is shown by the 
thick bands parallel to the diagonal, while no such patterns 
are observed in the matrix (b) . 
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FIG. 1. Evolution of the median folding time (measured in 
Monte Carlo steps), secondary structure content and contact 
order as a function of the number of generations in the opti- 
mization algorithm for compact structures of length L — 25. 
The dashed curve denotes an average over all ten structures 
in a given generation, whereas the solid curve shows the be- 
haviour of the structure at each generation with the fastest 
median folding time. Analogous results are obtained for other 
runs and for other values of L. The dramatic decrease of fold- 
ing time is accompanied by an equally significant increase in 
the secondary content. 
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secondary content 
FIG. 4. Scatter plot of folding time versus secondary con- 
tent for structures of length 25 collected over several genera- 
tion of the optimization algorithm. 
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